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Social relationships present a critical foundation for many 
real-world applications. However, both users and online social 
network (OSN) providers are hesitant to share social rela¬ 
tionships with untrusted external applications due to privacy 
concerns. In this work, we design LinkMirage, a system 
that mediates privacy-preserving access to social relationships. 
LinkMirage takes users’ social relationship graph as an input, 
obfuscates the social graph topology, and provides untrusted 
external applications with an obfuscated view of the social 
relationship graph while preserving graph utility. 

Our key contributions are (1) a novel algorithm for ob¬ 
fuscating social relationship graph while preserving graph 
utility, (2) theoretical and experimental analysis of privacy and 
utility using real-world social network topologies, including 
a large-scale Google-F dataset with 940 million links. Our 
experimental results demonstrate that LinkMirage provides up 
to lOx improvement in privacy guarantees compared to the 
state-of-the-art approaches. Overall, LinkMirage enables the 
design of real-world applications such as recommendation sys¬ 
tems, graph analytics, anonymous communications, and Sybil 
defenses while protecting the privacy of social relationships. 

I. Introduction 

Online social networks (OSNs) have revolutionized the way 
our society interacts and communicates with each other. Under 
the hood, OSNs can be viewed as a special graph structure 
composed of individuals (or organizations) and connections 
between these entities. These social relationships represent 
sensitive relationships between entities, for example, trusted 
friendships or important interactions in Facebook, Twitter, or 
Google-i-, which users want to preserve the security and privacy 
of. 

At the same time, an increasing number of third party appli¬ 
cations rely on users’ social relationships (these applications 
can be external to the OSN). E-commerce applications can 
leverage social relationships for improving sales flTl , and data- 
mining researchers also rely on the social relationships for 
functional analysis p^ , |35). Social relationships can be used 
to mitigate spam | |28| . Anonymous communication systems 
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can improve client anonymity by leveraging users’ social 
relationships (TT), 1^, ^ State-of-the-art Sybil defenses 
rely on social trust relationships to detect attackers Q, | [26| , 

d) 

However, both users and the OSN providers are hesitant to 
share social relationships/graphs with these applications due 
to privacy concerns. For instance, a majority of users are 
exercising privacy controls provided by popular OSNs such 
as Facebook, Google-f and Linkedin to limit access to their 
social relationships . Privacy concerns arise because external 
applications that rely on users’ social relationships can either 
explicitly reveal this information to an adversary, or allow 
the adversary to perform inference attacks (T4), (^, 0, 
p4| , 1^ . These concerns hinder the deployment of 
many real-world applications. Thus, there exist fundamentally 
conflicting requirements for any link obfuscation mechanism: 
protecting privacy for the sensitive links in social networks and 
preserving utility of the obfuscated graph for use in real-world 
applications. 

In this work, we design LinkMirage, a system that mediates 
privacy-preserving access to social relationships. LinkMirage 
takes users’ social relationship graph as an input, either via 
an OSN operator or via individual user subscriptions. Next, 
LinkMirage obfuscates the social graph topology to protect the 
privacy of users’ social contacts (edge/link privacy, not vertex 
privacy). LinkMirage then provides external applications such 
as graph analytics and anonymity systems (TT), (^, 
with an obfuscated view of the social relationship graph. 
Thus, LinkMirage provides a trade-off between securing the 
confidentiality of social relationships, and enabling the design 
of social relationship based applications. 

We present a novel obfuscation algorithm that first clusters 
social graphs, and then anonymizes intra-cluster links and 
inter-cluster links, respectively. We obfuscate links in a manner 
that preserves the key structural properties of social graphs. 
While our approach is of interest even for static social graphs, 
we go a step further in this paper, and consider the evolutionary 
dynamics of social graphs (node/link addition or deletion). 
We design LinkMirage to be resilient to such evolutionary 
dynamics, by consistently clustering social graphs across time 
instances. Consistent clustering improves both the privacy and 
utility of the obfuscated graphs. We show that LinkMirage 
provides strong privacy properties. Even a strategic adversary 
with full access to the obfuscated graph and prior information 
about the original social graph is limited in its ability to 
infer information about users’ social relationships. LinkMirage 
provides up to 3x privacy improvement in static settings, and 
up to lOx privacy improvement in dynamic settings compared 
to the state-of-the-art approaches. 


Overall, our work makes the following contributions. 

• First, we design LinkMirage to mediate privacy-preserving 
access to users’ social relationships. LinkMirage obfuscates 
links in the social graph (link privacy) and provides 
untrusted external applications with an obfuscated view of 
the social graph. LinkMirage can achieve a good balance 
between privacy and utility, under the context of both static 
and dynamic social network topologies. 

• Second, LinkMirage provides rigorous privacy guarantees to 
defend against strategic adversaries with prior information 
of the social graph. We perform link privacy analysis both 
theoretically as well as using real-world social network 
topologies. The experimental results for both a Facebook 
dataset (with 870K links) and a large-scale Google dataset 
(with 940M links) show up to lOx improvement in privacy 
over the state-of-the-art research. 

• Third, we experimentally demonstrate the applicability of 
LinkMirage in real-world applications, such as privacy¬ 
preserving graph analytics, anonymous communication and 
Sybil defenses. LinkMirage enables the design of social 
relationships based systems while simultaneously protecting 
the privacy of users’ social relationships. 

• Finally, we quantify a general utility metric for LinkMirage. 
We analyze our utility measurement provided by LinkMi¬ 
rage both theoretically and using real-world social graphs 
{Facebook and Googled). 

11. Background 
A. Motivating Applications 

In this paper, we focus our research on protecting the 
link privacy between labeled vertices in social networks fT^ , 
p9| , | [4^ . Mechanisms for graph analytics, anonymous com¬ 
munication, and Sybil defenses can leverage users’ social 
relationships for enhancing security, but end up revealing 
users’ social relationships to adversaries. For example, in the 
Tor network GD, the relays’ IP addresses (labels) are already 
publicly known (vertex privacy in p7| , p6| , | [45| is not 
useful). Tor operators are hesitant to utilize social trusts to 
set up the Tor circuit as recommended by | [30| , (D since 
the circuit construction protocol would reveal sensitive social 
contact information about the users. Our proposed link-privacy 
techniques can thus be utilized by the Tor relay operators to 
enhance system security while preserving link privacy. Overall, 
our work focuses on protecting users’ trust relationships while 
enabling the design of such systems. 

LinkMirage supports three categories of social relationship 
based applications: 1) Global access to the obfuscated graph: 
Applications such as social network based anonymity sys¬ 
tems (n), (g, and peer-to-peer networks |[^ can utilize 
LinkMirage (described in Section |III-B| ) to obtain a global 
view of privacy-preserving social graph topologies; 2) Local 
access to the obfuscated graph: an individual user can query 
LinkMirage for his/her obfuscated social relationships (local 
neighborhood information), to facilitate distributed applica¬ 
tions such as SybilLimit | [43| ; 3) Mediated data analytics: 
LinkMirage can enable privacy-preserving data analytics by 
running desired functional queries (such as computing graph 
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Fig. 1. LinkMirage architecture. LinkMirage first collects social link informa¬ 
tion through our social link app or directly through the OSN providers, and 
then applies an obfuscation algorithm to perturb the original social graph(s). 
The obfuscated graph(s) would be utilized to answer the query of the untrusted 
applications in a privacy-preserving manner. The third-party application (which 
queries the social link information) is considered an adversary which aims to 
obtain sensitive link information from the perturbed query results. 


modularity and pagerank) on the obfuscated graph topology 
and only returning the result of the query. Existing work 
(H), m demonstrated that the implementation of graph 
analytics algorithms could leak certain information. Instead of 
repeatedly adding perturbations to the output of each graph 
analytics algorithm as in differential privacy (^, (Tg, which 
would be rather costly, LinkMirage can obtain the perturbed 
graph just once to support multiple graph analytics. Such an 
approach protects the privacy of users’ social relationships 
from inference attacks using query results. There exists a 
plethora of attacks against vertex anonymity based mechanisms 
p0| , 13^ , p4| , p8| . Ji et al. fT9| recently showed that no 
single vertex anonymization technique was able to resist all 
the existing attacks. Note that these attacks are not applicable 
to link privacy schemes. Therefore, a sound approach to vertex 
anonymity must start with improvements in our understanding 
of link privacy. When used as first step in the design of vertex 
privacy mechanisms, our approach can protect the privacy of 
social contacts and graph links even when the vertices are de¬ 
anonymized using state-of-the-art approaches p0| , p2| , p4| , 
p8| . Furthermore, our method can even improve the resilience 
of vertex anonymity mechanisms against de-anonymization 
attacks when applied to unlabelled graphs (will be shown in 
Section |V-B| ). 


B. System Architecture and Threat Model 


Fig. shows the overall architecture for LinkMirage. For 
link privacy, we consider the third-party applications (which 
can query the social link information) as adversaries, which 
aim to obtain sensitive link information from the perturbed 
query results. A sophisticated adversary may have access to 
certain prior information such as partial link information of 
the original social networks, and such prior information can 
be extracted from publicly available sources, social networks 
such as Facebook, or other application-related sources as stated 
in 0 - The adversary may leverage Bayesian inference to infer 
the probability for the existence of a link. We assume that 
LinkMirage itself is trusted, in addition to the social network 
providers/user s who provid e the input social graph. 

In Section |IV-B |IV-C we define our Bayesian privacy 
metric (called anti-inference privacy) and an information the¬ 
oretic metric (called indistinguishability) to characterize the 
privacy offered by LinkMirage against adversaries with prior 

















information. In addition, the evolving social topologies intro¬ 
duce another serious threat where sophisticated adversaries can 
combine information available in multiple query results to infer 
users’ social relationships. We define anti-aggregation privacy 
in Section |IV-D[ for evaluating the privacy performance of 
LinkMirage against such adversaries. 


C. Basic Theory 

Let us denote a time series of social graphs as Go, * * * , Gt- 
For each temporal graph Gt = (Vt^Et), the set of vertices 
is Vt and the set of edges is Et. For our theoretical analysis, 
we focus on undirected graphs where all the \Et\ edges are 
symmetric, i.e. (i,j) G Et iff (j’G) ^ Et. Note that our ap¬ 
proach can be generalized to directed graphs with asymmetric 
edges. Pt is the transition probability matrix of the Markov 
chain on the vertices of Gt. Pt measures the probability that 
we follow an edge from one vertex to another vertex, where 
Pt (i^j) = l/deg(i) (deg(i) denotes the degree of vertex i) if 
G Et, otherwise Pt{i,j) = 0. A random walk starting 
from vertex v, selects a neighbor of v at random according to 
Pt and repeats the process. 


D. System Overview and Roadmap 


Our objective for LinkMirage is to obfuscate social relation¬ 
ships while balancing privacy for users’ social relationships 
and the usability for larg e-scale real-world applications (as 
will be stated in Section |III-A[ ). We deploy LinkMirage as 
a Facebook application that implements graph construction 
and obfuscation (as will be discussed in Section |III-B| ). We 
then describe the perturbation mechanism of LinkMirage in 
Section III-C where we take both the static and the temporal 
social network topology into consideration. Our perturbation 
mechanism consists of two steps: dynamic clustering which 
finds community structures in evolving graphs by simulta¬ 
neously considering consecutive graphs, and selective pertur¬ 
bation which perturbs the minimal amount of edges in the 
evolving graphs. Therefore, it is possible to use a very high 
privacy parameter in the perturbation process, while preserving 
structural properties such as community structu res. W e then 
discuss the scalability of our algorithm in Section III-D| and vi¬ 
sually show the effectiveness of our algorithm in Section [TlI-EI 
In Section |lvj we rigorously analyze the privacy advantage 
of our LinkMirage over the state-of-the-art work, through 
considering three adversarial scenarios including the worst- 
case Bayesian adversary. In Section |V| we apply our algorithm 
on various real world applications of anonymity systems, Sybil 
defenses and privacy-preserving analytics. In Section Ivg we 
further analyze the effectiveness of LinkMirage on preserving 
different kinds of graph structural performance. 


III. LinkMirage System 


A. Design Goals 

We envision that applications relying on social relationships 
between users can bootstrap this information from online 
social network operators such as Facebook, Google-H, Twit¬ 
ter with access to the users’ social relationships. To enable 
these applications in a privacy-preserving manner, a perturbed 
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Fig. 2. Our perturbation mechanism for Gt. Assume that Gt-i has already 
been dynamically obfuscated, based on dynamic clustering (step 1) and 
selective perturbation (step 2). Our mechanism analyzes the evolved graph 
Gt (step 3) and dynamically clusters Gt (step 4) based on the freed m hop 
neighborhood (m = 2) of new links (between green and blue nodes), the 
merging virtual node (the large red node in step 4), and the new nodes. By 
comparing the communities in Gt-i and Gt, we can implement selective 
perturbation (step 5), i.e. perturb the changed blue community independently 
and perturb the unchanged red and green communities in the same way as 
G'i._^, and then perturb the inter-cluster links. 


social graph topology (by adding noise to the original graph 
topology) should be available. 

Social graphs evolve over time, and the third-party appli¬ 
cations would benefit from access to the most current version 
of the graph. A baseline approach is to perturb each graph 
snapshot independently. However, the sequence of perturbed 
graphs provide significantly more observations to an adversary 
than just a single perturbed graph. We argue that an effective 
perturbation method should consider the evolution of the 
original graph sequence. Therefore, we have the overall design 
goals for our system as: 

1) We aim to obfuscate social relationships while balancing 
privacy for users’ social relationships and the usability for 
real-world applications. 

2) We aim to handle both the static and dynamic social 
network topologies. 

3) Our system should provide rigorous privacy guarantees 
to defend against adversaries who have prior information 
of the original graphs, and adversaries who can combine 
multiple released graphs to infer more information. 

4) Our method should be scalable to be applied in real-world 
large-scale social graphs. 

B. LinkMirage: Deployment 

To improve the usability of our proposed obfusca¬ 
tion approach (which will be described in detail in 
Section |III-C| ), and to avoid dependance on the OSN 
providers, we developed a Facebook application (available: 
https://apps.facebook.com/xxxx/Qthat implements graph con¬ 
struction (via individual user subscriptions) and obfuscation. 
The work fiow of the LinkMirage deployment is as follows: 
(i) When a user visits the above URL, Facebook checks the 
credentials of the user, asks whether to grant the user's friends 
permission, and then gets redirected to the application hosting 
server, (ii) The application server authenticates itself, and then 
queries Facebook for the information of the user’s friends. 


^Anonymized. 



























and returns their information such as user’s id. The list of 
user’s friends can then be collected by the application server 
to construct a Facebook social graph for the current timestamp. 
Leveraging LinkMirage, a perturbed graph for this timestamp 
would be available which preserves the link privacy of the 
users’ social relationships. 

Real-world systems such as Uproxy, Lantern, Kaleidoscope 
(m anonymity systems CD, (30), CD, Sybil defenses sys¬ 
tems | [43| can directly benefit from our protocol through 
automatically obtaining the perturbed social relationships. Fur¬ 
thermore, our protocol can enable privacy-preserving graph 
analytics for OSN providers. We will give more detailed 
explanations for supporting applications in Section III-F[ 


C. LinkMirage: Perturbation Algorithm 

Social networks evolve with time and publishing a time 
series of perturbed graphs raises a serious privacy challenge: 
an adversary can combine information available from multiple 
perturbed graphs over time to compromise the privacy of users’ 
social contacts 0. |TQ| , p9| . In LinkMirage, we take a time 
series of graph topologies into consideration, to account for 
the evolution of the social networks. Intuitively, the scenario 
with a static graph topology is just a special situation of the 
temporal graph sequence, and is thus inherently incorporated 
in our model. 

Consider a social graph series Go = {Vq, Eq),- - ,Gt = 
(Vt^Et). We want to transform the graph series to Gq = 
(Vb, L^o),'• • ,G^ = such that the vertices in G[ 

remain the same as in the original graph Gt, but the edges are 
perturbed to protect link privacy. Moreover, while perturbing 
the current graph Gt, LinkMirage has access to the past graphs 
in the time series (i.e.. Go, • * * ^ Gt-i). Our perturbation goal is 
to balance the utility of social graph topologies and the privacy 
of users’ social contacts, across time. 

Approach Overview: Our perturbation mechanism for 
LinkMirage is illustrated in Fig. 

Static scenario: For a static graph Gt-i, we first cluster 
it into several communities, and then perturb the links within 
each community. The inter-cluster links are also perturbed to 
protect their privacy. 

Dynamic scenario: Let us suppose that Gt evolves from 
Gt-i by addition of new vertices (shown in blue color). To 
perturb graph Gt, our intuition is to consider the similarity 
between graphs Gt-i and Gt. 

First, we partition Gt-i and Gt into subgraphs, by clustering 
each graph into different communities. To avoid randomness 
(guarantee consistency) in the clustering procedure and to 
reduce the computation complexity, we dynamically cluster the 
two graphs together instead of clustering them independently. 
Noting that one green node evolves by connecting with a new 
blue node, we free all the nodes located within m = 2 hops of 
this green node (the other two green nodes and one red node) 
and merge the remaining three red nodes to a big virtual node. 
Then, we cluster these new nodes, the freed nodes and the 
remaining virtual node to detect communities in Gt. 

Next, we compare the communities within Gt-i and Gt, 
and identify the changed and unchanged subgraphs. For the 
unchanged subgraphs Gi,G 2 , we set their perturbation at 
time t to be identical to their perturbation at time t — 1 , 

^We free the nodes from the previously clustering hierarchy. 


Algorithm 1 LinkMirage, with dynamic clustering (steps 
1-2) and selective perturbation (steps 3-6). The parameter 
k denotes the perturbation level for each community. 
Here, ch, un, in are short for changed, unchanged, inter¬ 
community, respectively. 


Input: {Gt, Gt-i, GJ_i} if t > 1 or {Gt} if t = 0; 

Output: G[\ 

G't, Gt =null; 
if t= 0 ; 


cluster Go to get Go; 

label Go as changed, i.e. Go-ch = Go; 

endif 

I^Begin Dynamic Clustering^/ 

1 . free the nodes within m hops of the changed links; 

2 . re-cluster the new nodes, the freed nodes, the remai- 
-ning merged virtual nodes in G(t_i) to get Gt; 

/^End Dynamic Clustering^/ 

/^Begin Selective Perturbation"^/ 

3. find the unchanged communities Gt-un and the chan- 
-ged communities Gt-ch; 

4. let 

5. perturb Gt-ch for G[_^Yi the static method; 

6 . foreach community pair a and b; 

if both of the communities belong to Gt-un 




else 

foreach marginal node Va{i) in a and in b 
randomly add an edge with pro- 

deg(ua(2)) deg(ub(j))|ua| 
l-E’abl (l^^a 1 + 

/^End Selective Perturbation"^/ 

return 


-lability -|E„,|(|„,|+|„,|)- to 


denoted by G^, G^. For the changed subgraph G 3 , we perturb it 
independently to obtain G 3 . We also perturb the links between 
communities to protect privacy of these inter-cluster links. 
Finally, we publish G[ as the combination of G^, G 2 , G 3 and 
the perturbed inter-cluster links. There are two key steps in 
our algorithm: dynamic clustering and selective perturbation, 
which we describe in detail as follows. 

1) Dynamic Clustering: Considering that communities in 
social networks change significantly over time, we need to 
address the inconsistency problem by developing a dynamic 
community detection method. Dynamic clustering aims to find 
community structures in evolving graphs by simultaneously 
considering consecutive graphs in its clustering algorithms. 
There are several methods in the literature to cluster evolving 
graphs 1 ^, but we found them to be unsuitable for use 
in our perturbation mechanism. One approach to dynamic 
clustering involves performing community detection at each 
timestamp independently, and then establishing relationships 
between communities to track their evolution ©• We found 
that this approach suffers from performance issues induced 
by inherent randomness in clustering algorithms, in addition 
to the increased computational complexity. Another approach 
is to combine multiple graphs into a single coupled graph 
j^. The coupled graph is constructed by adding edges be¬ 
tween the same nodes across different graphs. Clustering can 
be performed on the single coupled graph. We found that 
the clustering performance is very sensitive to the weights 










TABLE 1. Temporal Statistics of the Facebook Dataset. 


Time 

D 

I 

2 

3 

4 

5 

6 

7 

8 

# of nodes 

9,586 

9,719 

11,649 

13,848 

14,210 

16,344 

18,974 

26,220 

35,048 

# of edges 

48,966 

38,058 

47,024 

54,787 

49,744 

58,099 

65,604 

97,095 

142,274 

Average degree 

5.11 

3.91 

4.03 

3.96 

3.50 

3.55 

3.46 

3.70 
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Fig. 3. Dynamic Facebook interaction dataset topology, for t = 3, 4, 5. On the left, we can see that LinkMirage has superior utility than the baseline approach 
(Mittal et ah), especially for larger values of k (due to dynamic clustering). On the right, we show the overlapped edges (black) and the changed edges (yellow) 
between consecutive graphs: t=(3, 4) and t=(4, 5). We can see that in LinkMirage, the perturbation of unchanged communities is correlated across time (selective 
perturbation), minimizing information leakage and enhancing privacy. 


of the added links, resulting in unstable clustering results. 
Furthermore, the large dimensionality of the coupled graph 
significantly increases the computational overhead. 

For our perturbation mechanism, we develop an adaptive 
dynamic clustering approach for clustering the graph Gt using 
the clustering result for the previous graph Gt_i. This enables 
our perturbation mechanism to (a) exploit the link correla¬ 
tion/similarity in consecutive graph snapshots, and (b) reduce 
computation complexity by avoiding repeated clustering for 
unchanged links. 

Clustering the graph Gt from the clustering result of the 
previous graph Gt-i requires a backtracking strategy. We use 
the maximum-modularity method for clustering, which is 
hierarchical and thus easy to backtrack. Our backtrack strategy 
is to first maintain a history of the merge operations that led to 
the current clustering. When an evolution occurs, the algorithm 
backtracks over the history of merge operations, in order to 
incorporate the new additions and deletions in the graph. 

More concretely, if the link between node x and node 
y is changed (added or deleted), we omit all the m-hop 
neighborhoods of x and y as well as x and y themselves 
from the clustering result of the previous timestamp, and then 
perform re-clustering. All the new nodes, the changed nodes 
and their m-hop neighbors, and the remaining merged nodes 
in the previous clustering result would be considered as basic 
elements for clustering Gt (recall Figure [^. 

For efficient implementation, we store the intermediate 
results of the hierarchical clustering process in a data structure. 
Upon link changes between x^y, we free the m-hop neighbor¬ 
hood of x^y from the stored data structure. 


2 ) Selective perturbation: 

Intra-cluster Perturbation: After clustering Gt based on 
Gt-i using our dynamic clustering method, we perturb Gt 
based on Gt-i and the perturbed G[_i. First, we compare 
the communities detected in Gt-i and Gt, and classify them 
as changed or unchanged. Our unchanged classification does 
not require that the communities are exactly the same, but 
that the overlap among vertices/links exceeds a threshold. Our 


key idea is to keep the perturbation process for links in the 
unchanged communities to be identical to their perturbation 
in the previous snapshot. In this manner, we can preserve 
the privacy of these unchanged links to the largest extent; 
it is easy to see that alternate approaches would leak more 
information. For the communities which are classified as 
changed, our approach is to perturb their links independently 
of the perturbation in the previous timestamp. For independent 
perturbations, we leverage the static perturbation method of 
Mittal et al. in p^ . Their static perturbation deletes all the 
edges in the original graph, and replaces each edge {v^u) 
with a fake edge (^, w) selected from the k-hop random walk 
starting from v. Larger perturbation parameter k corresponds 
to better privacy and leads to worse utility. 

Inter-cluster Perturbation: Finally, we need to interconnect 
the subgraphs identified above. Suppose that \va\ nodes and 
\vi)\ nodes are connecting communities a and b respectively, 
and they construct an inter-community subgraph. For each 
marginal node Va{i) G Va and G (here the marginal 
node in community a (resp.6) refers to the node that has neigh¬ 
bors in the other community b (resp.a)) , we randomly connect 
them with probability II ^ Here, all the 

computations for deg('), |t’. (•) |, |^. | only consiaer the marginal 
nodes. We can combine the perturbed links corresponding to 
the unchanged communities, changed communities, and inter¬ 
community subgraphs, to compute the output of our algorithm, 
i.e., G'^. 


LinkMirage not only preserves the structural characteristics 
of the original graph series, but also protects the privacy of 
the users by randomizing the original links. As compared to 
prior work, our method provides stronger privacy and utility 
guarantees for evolving graphs. Detailed procedures are stated 
in Algorithm. 

Surprisingly, our approach of first isolating communities 
and then selectively perturbing them provides benefits even 
in a static context! This is because previous static approaches 
use a single parameter to control the privacy/utility trade-off. 


^This probability is set for the preservation of degree distributions as 
analyzed in Section 



















Thus, if we apply them to the whole graph using high privacy 
parameters, it would destroy graph utility (e.g. community 
structures). On the other hand, LinkMirage applies perturba¬ 
tions selectively to communities; thus it is possible to use a 
very high privacy parameter in the perturbation process, while 
preserving structural properties such as community structures. 


D. Scalable Implementation 

Our algorithm relies on two key graph theoretical tech¬ 
niques: community detection (serves as a foundation for the 
dynamic clustering step in LinkMirage) and random walk 
(serves as a foundation for the selective perturbation step in 
LinkMirage). The computational complexity for both commu¬ 
nity detection and random walk is 0{\Et\) j^, where 
\Et\ is the number of edges in graph Gt, theraore the 
overall computational complexity of our approach is 0 (|£’t|)- 
Furthermore, our algorithms are parallelizable. We adopt the 
GraphChi parallel framework in | |2^ to implement our algo¬ 
rithm efficiently using a commodity workstation (3.6 GHz, 
24GB RAM). Our parallel implementation scales to very large 
social networks; for example, the running time of LinkMirage 
is less than 100 seconds for the large scale Google+dataset 
(940 million links) (will be described in Section |IV-A| ) using 
our commodity workstation. 

E. Visual Depiction 

For our experiments, we consider a real world Facebook 
social network dataset | [4Q| among New Orleans regional 
network, spanning from September 2006 to January 2009. 
Here, we utilize the wall post interaction data which represents 
stronger trust relationships and comprises of 46,952 nodes 
(users) connected by 876,993 edges. We partitioned the dataset 
using three month intervals to construct a total of 9 graph 
instances as shown in Table |T| Fig. depicts the outcome of 
our perturbation algorithm on the partitioned Facebook graph 
sequence with timestamp t = 3,4,5 (out of 9 snapshots), 
for varying perturbation parameter k (perturbation parameter 
for each community). For comparative analysis, we consider 
a baseline approach | [29| that applies static perturbation for 
each timestamp independently. In the dynamic clustering step 
of our experiments, we free the two-hop neighborhoods of the 
changed nodes, i.e. m = 2. 

The maximum-modularity clustering method yields two 
communities for G 3 , three communities for G 4 , and four 
communities for G 5 . For the perturbed graphs, we use the 
same color for the vertices as in the original graph and 
we can see that fine-grained structures (related to utility) 
are preserved for both algorithms under small perturbation 
parameter k, even though links are randomized. Even for high 
values of k, LinkMirage can preserve the macro-level (such 
as community-level) structural characteristics of the graph. On 
the other hand, for high values of k, the static perturbation 
algorithm results in the loss of structure properties, and appears 
to resemble a random graph. Thus, our approach of first 
isolating communities and applying perturbation at the level 
of communities has benefits even in a static context. 

Fig. also shows the privacy benefits of our perturbation 
algorithm for timestamps t = 4, 5. We can see that LinkMirage 
reuses perturbed links (shown as black unchanged links) in 


the unchanged communities (one unchanged community for 
t = 4 and two unchanged communities for t = 5). Therefore, 
LinkMirage preserves the privacy of users’ social relationships 
by considering correlations among the graph sequence, and this 
benefit does not come at the cost of utility. In the following 
sections, we will formally quantify the privacy and utility 
properties of LinkMirage. 


E Supporting Applications 

As discussed in Section |II-A[ LinkMirage supports three 
types of applications: 1) Global access to obfuscated graphs: 
real-world applications can utilize our protocol to automat¬ 
ically obtain the secure social graphs to enable social re¬ 
lationships based systems. For instance. Tor operators O 
(or other anonymous communication network such as Pisces 
in p 0 | ) can leverage the perturbed social relationships to 
set up the anonymous circuit; 2) Local access to the obfus¬ 
cated graphs: an individual user can query our protocol for 
his/her perturbed friends (local neighborhood information), 
to implement distributed applications such as SybilLimit in 
| [43| ; 3) Mediated data analysis: the OSN providers can also 
publish perturbed graphs by leveraging LinkMirage to facilitate 
privacy-preserving data-mining research, i.e., to implement 
graph analytics such as pagerank score | [35| , modularity 
while mitigating disclosure of users’ social relationships. Ex¬ 
isting work in |T^ , |T3| demonstrated that the implementation 
of graph analytic algorithms would leak certain information. 
To avoid repeatedly adding perturbations to the output of every 
graph analytic algorithm, which is rather costly, the OSN 
providers can first obtain the perturbed graphs by leveraging 
LinkMirage and then enable these graph analytics in a privacy¬ 
preserving manner. 


IV. Privacy Analysis 


We now address the question of understanding link pri¬ 
vacy of LinkMirage. We propose three privacy metrics: anti¬ 
inference privacy, indistinguishability, anti-aggregation pri¬ 
vacy to evaluate the link privacy provided by LinkMirage. 
Both theoretical analysis and experimental results with a Face- 
book dataset (870K links) and a large-scale Google-\- dataset 
(940M links) show the benefits of LinkMirage over previous 
approaches. We also illustrate the relationship between our 
privacy metric and differential privacy. 


A. Experimental Datasets 


To illustrate how the temporal information degrades privacy, 
we consider two social network datasets. The first one is a 
large-scale Google-F dataset (T4). whose temporal statistics are 
illustrated in Table To the best of our knowledge, this is the 
largest temporal dataset of social networks in public domain. 
The Google-F dataset is crawled from July 2011 to October 
2011 which has 28,942,911 nodes and 947,776,172 edges. 
The dataset only considers link additions, i.e. all the edges in 
the previous graphs exist in the current graph. We partitioned 
the dataset into 84 timestamps. The second one is the 9- 
timesta mp Ea cebook wall posts dataset | [40| as we stated in 
Section III-E with temporal characteristics shown in Table |I] It 
is worth noting that the wall-posts data experiences tremendous 
chum with only 45% overlap for consecutive graphs. Since our 
dynamic perturbation method relies on the correlation between 








TABLE IT Temporal Statistics of the Google+ Dataset. 


lime 

jno9 


Aug. 18 


Sep.7 


Sep. 27 

0^77 

# of nodes 

16,165,781 

17,483,936 

17,850,948 

19,406,327 

19,954,197 

24,235,387 

28,035,472 

28,942,911 

# of edges 

505,527,124 

560,576,194 

575,345,552 

654,523,658 

686,709,660 

759,226,300 

886,082,314 

947,776,172 

Average degree 

31.2714 

32.0624 

32.2305 

33.7273 

34.4143 

31.3272 

31.6058 

32.7464 



- Prior probability 
-k=5, Mittal et al. 
-k=5, LinkMirage 
-k=20, Mittal et al. 
-k=20, LinkMirage 


(a) Inference Probability 



Fig. 4. (a),(b) represent the link probability distributions for the whole Facebook interaction dataset and the sampled Facebook interaction dataset with 80% 
overlap. We can see that the posterior probability of LinkMirage is more similar to the prior probability than the baseline approach. 


consecutive graphs, the evaluation of our dynamic method on 
the Facebook wall posts data is conservative. To show the 
improvement in performance of our algorithm for graphs that 
evolve at a slower rate, we also consider a sampled graph 
sequence extracted from the Facebook wall posts data with 
80% overlap for consecutive graphs. 


B. Anti-Inference Privacy 

First, we consider adversaries that aim to infer link in¬ 
formation by leveraging Bayesian inference. We define the 
privacy of a link Lt (or a subgraph) in the t-th graph instance, 
as the difference between the posterior probability and the 
prior probability of the existence of the link (or a subgraph), 
computed by the adversary using its prior information W, 
and the knowledge of the perturbed graph sequence 
Utilizing Bayesian inference, we have 


Definition 1: For link Lt in the original graph sequence 
Go, • • • ? Gt and the adversary’s prior information W, the anti¬ 
inference privacy Privacyfor the perturbed graph sequence 
Gq , • • • , GJ is evaluated by the similarity between the poste¬ 
rior probability P{Lt\{G[}l^Q,W) and the prior probability 
P{Lt\W), where the posterior probability is 


P{Lt\{G’^U,W) 


P{{G[}U\Lt,W) X P{Lt\W) 

pm}i=o\w) 


(1) 


Higher similarity implies better anti-inference privacy. 


The difference between the posterior probability and the prior 
probability represents the information leaked by the perturba¬ 
tion mechanism. Similar intuition has been mentioned in 
Therefore, the posterior probability should not differ much 
from the prior probability. 

In the above expression, P{Lt\W) is the prior probability 
of the link, which can be computed based on the known 
structural properties of social networks, for example, by using 
link prediction algorithms Note that P({G'}^^ol^) ^ 

normalization constant that can be analyzed by sampling tech¬ 
niques. The key challenge is to compute P({G-}^^ol^t 5 


^The detailed process for computing the posterior probability can be found 




Fig. 5. Link probability distribution for the Google+ dataset under the 
adversary’s prior information extracted from the social-attribute network model 



For evaluation, we consider a special case where the adver¬ 
sary’s prior is the entire time series of original graphs except 
the link Lt (which is the link we want to quantify privacy for, 
and Lt = 1 denotes the existence of this link while Lt = 0 
denotes the non-existence of this link). Such prior information 
can be extracted from personal public information, Facebook 
related information or other application-related information as 
stated in ©■ Note that this is a very strong adversarial prior, 
which would lead to the worst-case analysis of link privacy. 
Denoting {Gi{Lt)}l^Q as the prior which contains all the 
information except Lt, we have the posterior probability of 
link Lt under the worst case is 

P{L,\{Gl}U,{Gi{LtJ}U) 

P({GaLol,L,{g4^t)}Lo) X P{Lt\{Gi{Lt)}U) 

where 

P{{G'i}U\Lt, {Gi{Lt)}Uo) = P(G^,|Go(L))x 

P{G[\G'o,Go{Lt),Gi{Lt)) ■ ■ ■ P{G't\G't_„Gt-i{Lt),Gt{Lt)) 
Therefore, the objective of perturbation algorithms is to make 

P(L|{GaLodGi(L)}Lo) close to P(L|{Gi(L)}Lo)- 
Comparison with previous work: Fig. shows the pos¬ 
terior probability distribution for the whole Facebook graph 
sequence and the sampled Facebook graph sequence with 
80% overlapping ratio, respectively. We computed the prior 
probability using the link prediction method in p4| . We can 






































Fig. 6. (a),(b) represent the temporal indistinguishability for the whole Facebook interaction dataset and the sampled Facebook interaction dataset with 80% 
overlap. Over time, the adversary has more information, resulting in decreased indistinguishability. We can also see that LinkMirage has higher indistinguishability 
than the static method and the Hay’s method in (T^, although it still suffers from some information leakage. 


see that the posterior probability corresponding to LinkMirage 
is closer to the prior probability than that of the method of 
Mittal et al. p9| . In Fig. 0b), taking the point where the 
link probability equals 0.1, the distance between the posterior 
CDF and the prior CDF for the static approach is a factor of 3 
larger than LinkMirage (k = 20). Larger perturbation degree 
k improves privacy and leads to smaller difference with the 
prior probability. Finally, by comparing Fig. Qa) and (b), we 
can see that larger overlap in the graph sequence improves the 
privacy benefits of LinkMirage. We also compare with the 
work of Hay et al. in fT^ , which randomizes the graph with 
r real links deleted and another r fake links introduced. The 
probability for a real link to be preserved in the perturbed graph 
is 1 — r/m, which should not be small otherwise the utility 
would not be preserved. Even considering r/m = 0.5 (which 
would substantially hurt utility |T^), the posterior probability 
for a link using the method of Hay et al. would be 0.5, 
even without prior information. In contrast, our analysis for 
LinkMirage considers a worst-case prior, and shows that the 
posterior probability is smaller than 0.5 for more than 50% of 
the links when /c = 20 in Fig. Therefore, our LinkMirage 
provides significantly higher privacy than the work of Hay et 
al. 

Adversaries with structural and contextual information: 

Note that our analysis so far focuses on quantifying link- 
privacy under an adversary with prior information about the 
original network structure (including link prediction capabil¬ 
ities). In addition, some adversaries may also have access to 
contextual information about users in the social network, such 
as user attributes, which can also be used to predict network 
links (e.g., social-attribute network prediction model in &■ 
We further computed the prior probability using such social- 
attribute network prediction model in GD and showed the link 
probability for the Google-F dataset in Fig. The posterior 
probability of our LinkMirage is closer to the prior probability 
and thus LinkMirage achieves better privacy performance than 
previous work. 


C. Indistinguishability 

Based on the posterior probability of a link under the worst 
case {G^(Lt)}^^o)’ ^o qualify the pri¬ 

vacy metric for adversaries who aim to distinguish the posterior 
probability with the prior probability. Since our goal is to 
reduce the information leakage of Lf based on the perturbed 
graphs the prior knowledge we 

consider the metric of indistinguishability to quantify privacy. 


which can be evaluated by the conditional entropy of a private 
message given the observed variables Q. The objective for 
an obfuscation scheme is to maximize the indistinguishability 
of the unknown input / given the observables O, i.e. H{I\0) 
(where H denotes entropy of a variable |[7|). Here, we define 
our metric for link privacy as 

Definition 2: The indistinguishability for a link Lt in 
the original graph Gt that the adversary can infer 
from the perturbed graph G[ under the adversary’s prior 
information {Gi{Lt)Yi={) is defined as Privacy^^ = 
H{Lt\{G',}U,{Gi{Lt)}U). 

Furthermore, we quantify the behavior of indistinguishability 
over time. For our analysis, we continue to consider the worst 
case prior of the adversary knowing the entire graph sequence 
except the link Lt. To make the analysis tractable, we add 
another condition that if the link L exists, then it exists in 
all the graphs (link deletions are rare in real world social 
networks). For a large-scale graph, only one link would not 
affect the clustering result. Then, we have 

Theorem 1: The indistinguishability decreases with time. 


H{L\{G[}U,{Gi{L)}U) > H{LmY+l{Gi{L)Y+l) 

( 2 ) 

The inequality follows from the theorem conditioning reduces 
entropy in Q-Eqg shows that the indistinguishability would 
not increase as time evolves. The reason is that over time, 
multiple perturbed graphs can be used by the adversary to 
infer more information about link L. 

Next, we theoretically show why LinkMirage has better 
privacy performance than the static method. For each graph 
Gt, denote the perturbed graphs using LinkMirage and the 
static method as GJ,G^’^, respectively. 

Theorem 2: The indistinguishability for LinkMirage is 
greater than that for the static perturbation method, i.e. 


H{Lt\{G'YU,{Gi{Lt)Y=Y > H{Lt\{G'YYi=^YGi{Lt)Yi=Y 

(3) 

Proof: In LinkMirage, the perturbation for the current 
graph Gt is based on perturbation for Gt-i. Let us denote 




















the changed subgraph between Gt-i^Gt as Gt-ch, then 

G^_i, G'^ - Gt-ch5 ^t-ch’ {Gi{Lt)Yi=Q) 

(L, I {GalIo^ G^’^ch. (^t) }Lo) 

>i^(L,|{Gailo'.Gt'^{G,(L,)}Lo) 

where the first inequality also comes from the theorem condi¬ 
tioning reduces entropy in The second inequality general¬ 
izes the first inequality from a snapshot t to the entire sequence. 
From Eq|^ we can see that LinkMirage may offer superior 
indistinguishability compared to the static perturbation, and 
thus provides higher privacy. ■ 

Comparison with previous work: Next, we experimentally 
analyze our indistinguishability metric over time. Fig. [^depicts 
the indistinguishability metric using the whole Facebook graph 
sequence and the sampled Facebook graph sequence with 80% 
overlap. We can see that the static perturbation leaks more 
information over time. In contrast, the selective perturbation 
achieves significantly higher indistinguishability. In Fig. [^a), 
after 9 snapshots, and using k = 5, the indistinguishability 
of the static perturbation method is roughly 1/10 of the 
indistinguishability of LinkMirage. This is because selective 
perturbation explicitly takes the temporal evolution into con¬ 
sideration, and stems privacy degradation via the selective 
perturbation step. Comparing Fig. [^a) and (b), LinkMirage 
has more advantages for larger overlapped graph sequence. 

We also compare with the work of Hay et al. in |T^ , For the 
first timestamp, the probability for a real link to be preserved 
in the anonymized graph is 1 — r/m. As time evolves, the 
probability would decrease to (1 — r/m)^. Combined with the 
prior probability, the corresponding indistinguishability for the 
method of Hay et al. is shown as the black dotted line in 
Fig. which converges to 0 very quickly (we also consider 
r/m = 0.5 which would substantially hurt utility [p^) 
Compared with the work of Hay et al, LinkMirage significantly 
improves privacy performance. Even when t = 1, LinkMirage 
with k = 20 achieves up to lOx improvement over the 
approach of Hay et al. in the indistinguishability performance. 

D. Anti-aggregation Privacy 

Next, we consider the adversaries who try to aggregate all 
the previously published graphs to infer more information. 
Recall that after community detection in our algorithm, we 
anonymize the links by leveraging the k-hop random walk. 
Therefore, the perturbed graph G' is actually a sampling of 
the k-hop graph G^, where the k-hop graph G^ represents 
graph where all the k-hop neighbors in the original graph 
are connected. It is intuitive that a larger difference between 
G^ and G' represents better privacy. Here, we utilize the 
distance between the corresponding transition probability ma¬ 
trices \\P^ — -P/IItv 0 to measure this difference. And we 
extend the definition of total variance GD from vector to 
matrix by averaging total variance distance of each row in the 
matrix, i.e. WPj^ - P/||tv = ll-P*(^') - PI{v)\\tv, 

^We choose the total variance distance to evaluate the statistical distance 
between and P/ as in 


where P^{v),Pl{v) denotes the ^-th row of We then 

formally define the anti-aggregation privacy as 

Definition 3: The anti-aggregation privacy for a perturbed 
graph G[ with respect to the original graph Gt and the pertur¬ 
bation parameter k is Privacy (Gt, GJ, k) = \\P^ — -P/IItv- 

The adversary’s final objective is to obtain an estimated 
measurement of the original graph, e.g. the estimated transition 
probability matrix Pf which satisfies P^ = P/. A straightfor¬ 
ward manner to evaluate privacy is to compute the estimation 
error of the transition probability matrix i.e. ||Pt — A||tv- 
We can derive the relationship between the anti-aggregation 
privacy and the estimation error as (we defer the proofs to the 
Appendix to improve readability.) 

Theorem 3: The anti-aggregation privacy is a lower bound 
of the estimation error for the adversaries, and 

||p-P/||TV<fc||Pt-A||TV (4) 

We further consider the network evolution where the adversary 
can combine all the perviously perturbed graphs together to 
extract more k-hop information of the current graph. Under 
this situation, a strategic methodology for the adversary is to 
combine the perturbed graph series Gq, • • • , GJ, to construct 
a new perturbed graph GJ, where GJ = IJi=o i- - t^i- The 
combined perturbed graph G[ contains more information about 
the k-hop graph G^ than GJ. Correspondingly, the transition 
probability matrix P/ of the combined perturbed graph G[ 
would provide more information than P/. That is to say, the 
anti-aggregation privacy decreases with time. 

Comparison with previous work: We evaluate the anti¬ 
aggregation privacy of LinkMirage on both the Google-F 
dataset and the Facebook dataset. Here we perform our exper¬ 
iments based on a conservative assumption that a link always 
exists after it is introduced. The anti-aggregation privacy 
decreases with time since more information about the k-hop 
neighbors of the graph is leaked as shown in Fig. Our 
selective perturbation preserves correlation between consec¬ 
utive graphs, therefore leaks less information and achieves 
better privacy than the static baseline method. For the Google-F 
dataset, the anti-aggregation privacy for the method of Mittal 
et al. is only 1/10 of LinkMirage after 84 timestamps. 

E. Relationship with Differential Privacy 

Our anti-inference privacy analysis considers the worst- 
case adversarial prior to infer the existence of a link in the 
graph. Next, we uncover a novel relationship between this anti¬ 
inference privacy and differential privacy. 

Differential privacy is a popular theory to evaluate the 
privacy of a perturbation scheme [?], (g, ig, 1^. The 
framework of differential privacy defines local sensitivity 
of a query function / on a dataset Di as the maximal 
\f{Di) — /(D2)|i for all D 2 differing from Di in at most one 
element df = mdiXD^ ||/(Di) — /(D2II1. Based on the theory 
of differential privacy, a mechanism that adds independent 
Laplacian noise with parameter df/e to the query function 
/, satisfies e-differential privacy. The degree of added noise, 
which determines the utility of the mechanism, depends on the 





Fig. 7. (a)(b) show the temporal anti-aggregation privacy for the Google-r dataset and the Facebook dataset, respectively. The anti-aggregation privacy decreases 
as time evolves because more information is leaked with more perturbed graphs available. Leveraging selective perturbation, LinkMirage achieves much better 
anti-aggregation privacy than the static baseline method. 


local sensitivity. To achieve a good utility as well as privacy, 
the local sensitivity df should be as small as possible. The 
following lemma demonstrates the effectiveness of worst-case 
Bayesian analysis since the objective for good utility-privacy 
balance under our worst-case Bayesian analysis is equivalent 
to under differential privacy. 

Remark 1: The requirement for good utility-privacy balance 
in differential privacy is equivalent to the objective of our 
Bayesian analysis under the worst case. (We defer the proofs 
to Appendix to improve readability.) 

F. Summary for Privacy Analysis 

• LinkMirage provides rigorous privacy guarantees to defend 
against adversaries who have prior information about the 
original graphs, and the adversaries who aim to combine 
multiple released graphs to infer more information. 

• LinkMirage shows significant privacy advantages in anti¬ 
inference privacy, indistinguishability and anti-aggregation 
privacy, by outperforming previous methods by a factor up 
to 10. 

V. Applications 

Applications such as anonymous communication JTT), 

(D and vertex anonymity mechanisms | [Z7| , can uti¬ 

lize LinkMirage to obtain the entire obfuscated social graphs. 
Alternatively, each individual user can query LinkMirage for 
his/her perturbed neighborhoods to set up distributed social re¬ 
lationship based applications such as SybilLimit | [43| . Further, 
the OSN providers can also leverage LinkMirage to perturb 
the original social topologies only once and support multiple 
privacy-preserving graph analytics, e.g., privately compute the 
pagerank/modularity of social networks. 

A. Anonymous Communication /[^ 

As a concrete application, we consider the problem of 
anonymous communication (TT), 0. Systems for 

anonymous communication aim to improve user’s privacy 
by hiding the communication link between the user and the 
remote destination. Nagaraja et al. and others (TT), (^, (^ 
have suggested that the security of anonymity systems can be 
improved by leveraging users’ trusted social contacts. 

We envision that our work can be a key enabler for the 
design of such social network based systems, while preserving 
the privacy of users’ social relationships. We restrict our 



Fig. 8. The worst case probability of deanonymizing users’ communications 
(/ = 0.1). Over time, LinkMirage provides better anonymity compared to the 
static approaches. 


analysis to low-latency anonymity systems that leverage social 
links, such as the Pisces protocol p0| . 

Similar to the Tor protocol, users in Pisces rely on proxy 
servers and onion routing for anonymous communication. 
However, the relays involved in the onion routing path are 
chosen by performing a random walk on a trusted social 
network topology. Recall that LinkMirage better preserves the 
evolution of temporal graphs in Fig. We now show that this 
translates into improved anonymity over time, by performing 
an analysis of the degradation of user anonymity over multiple 
graph snapshots. For each graph snapshot, we consider a worst 
case anonymity analysis as follows: if a user’s neighbor in the 
social topology is malicious, then over multiple communica¬ 
tion rounds (within that graph instance) its anonymity will be 
compromised using state-of-the-art traffic analysis attacks ED- 
Now, suppose that all of a user’s neighbors in the first graph 
instance are honest. As the perturbed graph sequence evolves, 
there is further potential for degradation of user anonymity 
since in the subsequent instances, there is a chance of the user 
connecting to a malicious neighbor. Suppose the probability 
for a node to be malicious is /. Denote nt{v) as the distinct 
neighbors of node v at time t. For a temporal graph sequence, 
the number of the union neighbors U\^Qnk{v) of v increases 
with time, and the probability for v to be attacked under the 
worst case is ^ Note that in 

practice, the adversary’s prior information will be significantly 
less than the worst-case adversary. 

Fig. depicts the degradation of the worst-case anonymity 
with respect to the number of perturbed topologies. We can 
see that the attack probability for our method is lower than the 
static approach with a factor up to 2. This is because over con¬ 
secutive graph instances, the users’ social neighborhood has 
higher similarity as compared to the static approach, reducing 
potential for anonymity degradation. Therefore, LinkMirage 























Fig. 9. (a) shows the false positive rate for Sybil defenses. We can see that the perturbed graphs have lower false positive rate than the original graph. Random 
walk length is proportional to the number of Sybil identities that can be inserted in the system, (b) shows that the final attack edges are roughly the same for 
the perturbed graphs and the original graphs. 


can provide better security for anonymous communication, and 
other social trust based applications. 


B. Vertex Anonymity 1^, ^ 


Previous work for vertex anonymity p7| , p6|, | [45| would 
be defeated by de-anonymization techniques | |20| , |32| , p4| , 
p8| . LinkMirage can serve as a formal first step for vertex 
anonymity, and even improve its defending capability against 
de-anonymization attacks. We apply LinkMirage to anonymize 
vertices, i.e. to publish a perturbed topology without labeling 
any vertex. In p0| , Ji et al. modeled the anonymization as 
a sampling process where the sampling probability p denotes 
the probability of an edge in the original graph Go to exist in 
the anonymized graph G'. LinkMirage can also be applied for 
such model, where the perturbed graph G' is sampled from 
the k-hop graph G^ (corresponding to Go). 

They also derived a theoretical bound of the sampling prob¬ 
ability p for perfect de-anonymization, and found that a weaker 
bound is needed with a larger value of the sampling probability 
p. Larger p implies that G' is topologically more similar to G, 
making it easier to enable a perfect de-anonymization. When 
considering social network evolution, the sampling probability 
p can be estimated as |L^(Go,'-' , GJ)|/|L^(Go, • " 
where E{Gq^ - • • , GH are the edges of the perturbed graph 
sequence, and E{Gq, • • • , G^) are the edges of the k-hop 
graph sequence. Compared with the static baseline approach, 
LinkMirage selectively reuses information from previously 
perturbed graphs, thus leading to smaller overall sampling 
probability p, which makes it harder to perfectly de-anonymize 
the graph sequence. For example, the average sampling prob¬ 
ability p for the Google-f dataset (with k = 2) is 0.431 and 
0.973 for LinkMirage and the static method respectively. For 
the Facebook temporal dataset (with k = 3), the average 
sampling probability p is 0.00012 and 0.00181 for LinkMirage 
and the static method respectively. Therefore, LinkMirage is 
more resilient against de-anonymization attacks even when 
applied to vertex anonymity, with up to lOx improvement. 


C. Sybil Defenses 

Next, we consider Sybil defenses systems which leverage 
the published social topologies to detect fake accounts in the 
social networks. Here, we analyze how the use of a perturbed 
graph changes the Sybil detection performance of SybilLimit 
| |43| , which is a representative Sybil defense system. Each user 
can query LinkMirage for his/her perturbed friends to set up 


TABLE III. Modularity of Perturbed Graph Topologies 


Google+ 

Original 

Graph 

LinkMirage 

k = 2 

LinkMirage 

A; = 5 

Mittal et al. 

k = 2 

Mittal et al. 

k — 3 

Modularity 

0.605 

(mn 

0:603 

cmi 

0.586 

Facebook 

Original 

Graph 

LinkMirage 

/c = 5 

LinkMirage 

k = 20 

Mittal et al. 

k — 3 

Mittal et al. 

/c = 20 

Modularity 

0.488 

0:479 

0:487 

0376 

03T3 


the implementation of SybilLimit. Eig. [^a) depicts the false 
positives (honest users misclassified as Sybils) with respect to 
the random walk length in the Sybillimit protocol. Eig. |^b) 
shows the final attack edges with respect to the attack edges 
in the original topology. We can see that the false positive rate 
is much lower for the perturbed graphs than for the original 
graph, while the number of the attack edges stay roughly 
the same for the original graph and the perturbed graphs. 
The number of Sybil identities that an adversary can insert 
is given by S = g' • w' (g' is the number of attack edges 
and w' is the random walk parameter in the protocol). Since 
g' stays almost invariant and the random walk parameter w' 
(for any desired false positive rate) is reduced, LinkMirage 
improves Sybil resilience and provides the privacy of the social 
relationships such that Sybil defense protocols continue to be 
applicable (similar to static approaches whose Sybil-resilience 
performance have been demonstrated in previous work). 


D. Privacy-preserving Graph Analytics p?] /, J?5] / 

Next, we demonstrate that LinkMirage can also benefit 
the OSN providers for privacy-preserving graph analytics. 
Previous work in (H), d) have demonstrated that the im¬ 
plementation of graph analytic algorithms would also result in 
information leakage. To mitigate such privacy degradation, the 
OSN providers could add perturbations (noises) to the outputs 
of these graph analytics. However, if the OSN providers aim 
to implement multiple graph analytics, the process for adding 
perturbations to each output would be rather complicated. 
Instead, the OSN providers can first obtain the perturbed graph 
by leveraging LinkMirage and then set up these graph analytics 
in a privacy-preserving manner. 

Here, we first consider the pagerank | [35| as an effective 
graph metric. Eor the Eacebook dataset, we have the average 
differences between the perturbed pagerank score and the 
original pagerank score as 0.0016 and 0.0018 for k = 5 
and k = 20 respectively in LinkMirage. In comparison, 
the average differences are 0.0019 and 0.0087 for /c = 5 
and k = 20 in the approach of Mittal et al. LinkMirage 
























TABLE IV. Graph Metrics of the Original and the Perturbed 
Graphs for the Google+ Dataset. 



Clustering Coefficient 

Assortativity Coefficient 

Original Graph 

0.2612 

-0.0152 

LinkMirage k = 2 

0.2263 

-0.0185 

LinkMirage k = 5 

0.1829 

-0.0176 

LinkMira^ k = 10 

0.0864 

-0.0092 

LinkMirage k = 20 

0.0136 

-0.0063 


preserves the pagerank score of the original graph with up 
to 4x improvement over previous methods. Next, we show 
the modularity (computed by the timestamp t = 3 in 
the Google+ dataset and the Facebook dataset, respectively) 
in Table |I^ We can see that LinkMirage preserves both the 
pagerank score and the modularity of the original graph, while 
the method of Mittal et al. degrades such graph analytics 
especially for larger perturbation parameter k (recall the visual 
intuition of LinkMirage in Fig. [^. 


E. Summary for Applications of LinkMirage 

• LinkMirage preserves the privacy of users’ social contacts 
while enabling the design of social relationships based 
applications. Compared to previous methods, LinkMirage 
results in significantly lower attack probabilities (with a 
factor up to 2) when applied to anonymous communications 
and higher resilience to de-anonymization attacks (with a 
factor up to 10) when applied to vertex anonymity systems. 

• LinkMirage even surprisingly improves the Sybil detection 
performance when applied to the distributed SybilLimit 
systems. 

• LinkMirage preserves the utility performance for multiple 
graph analytics applications, such as pagerank score and 
modularity with up to 4x improvement. 


Definition 4: The Utility Distance (UD) of a perturbed 
graph sequence Gq , • • • , Gf with respect to the original graph 
sequence Go, • * * ? Gt, and an application parameter I is de¬ 
fined as 


T 

t=o veVt ' 


(5) 


Our definition for utility distance in Eq. is intuitively 
reasonable for a broad class of real-world applications, and 
captures the behavioral differences of /-hop random walks 
between the original graphs and the perturbed graphs. We 
note that random walks are closely linked to the structural 
properties of social networks. In fact, a lot of social network 
based security applications such as Sybil defenses | |43| and 
anonymity systems p0| directly perform random walks in 
their protocols. The parameter / is application specific; for 
applications that require access to fine grained local struc¬ 
tures, such as recommendation systems j^, the value of / 
should be small. For other applications that utilize coarse and 
macro structure of the social graphs, such as Sybil defense 
mechanisms, / can be set to a larger value (typically around 
10 in 143]). Therefore, this utility metric can quantify the 
utility performance of LinkMirage for various applications in 
a general manner. 

Note that LinkMirage is not limited to only preserving the 
community structure of the original graphs. We evaluate two 
representative graph theoretic metrics clustering coefficient and 
assortativity coefficient E) as listed in Table |I^ We can 
see that LinkMirage well preserves such fine-grained structural 
properties for smaller perturbation parameter k. Therefore, the 
extent to which the utility properties are preserved depends on 
the perturbation parameter k. 


VI. Utility Analysis 

Following the application analysis in Section [V| we aim 
to develop a general metric to characterize the utility of 
the perturbed graph topologies. Furthermore, we theoretically 
analyze the lower bound on utility for LinkMirage, uncover 
connections between our utility metric and structural properties 
of the graph sequence, and experimentally analyze our metric 
using the real-world Google-F and Facebook datasets. 

A. Metrics 

We aim to formally quantify the utility provided by LinkMi¬ 
rage to encompass a broader range of applications. One 
intuitive global utility metric is the degree of vertices. It is 
interesting to find that the expected degree of each node in 
the perturbed graph is the same as the original degree and we 
defer the proof to Appendix to improve readability. 

Theorem 4: The expected degree of each node after pertur¬ 
bation by LinkMirage is the same as in the original graph: 
Vi; G Ut, E(deg^(i;)) = deg(i;), where deg^(i;) denotes the 
degree of vertex i; in GJ. 

To understand the utility in a fine-grained level, we further 
define our utility metric as 


B. Relationships with Other Graph Structural Properties 

The mixing time r^{Gt) measures the time required for 
the Markov chain to converge to its stationary distribution, and 
is defined as r^{Gt) = min^ max^(r| |P/’(i;) — tt^Itv < e). 
Based on the Perron-Frobenius theory, we denote the eigen- 
values of P( as 1 = /Ui(Gt) > At 2 (Gt) > • ■ ■ l^\Vt\{Gt) > -1- 
The convergence rate of the Markov chain to tt^ is deter¬ 
mined by the second largest eigenvalue modulus (SLEM) as 
= max 

Since our utility distance is defined by using the transition 
probability matrix Pt, this metric can be proved to be closely 
related to structural properties of the graphs, as shown in 
Theorem [5] and Theorem [b] 

Theorem 5: Let us denote the utility distance between 
the perturbed graph G[ and the original graph Gt by 
UD(G„ G', /), then we have (UD (G„ G', (e)) - e) > 

TGtie)- 

Theorem 6: Let us denote the second largest eigenvalue 
modulus (SLEM) of transition probability matrix Pt of 
graph Gt as pcf We can bound the SLEM of a perturbed 
graph G[ using the mixing time of the original graph, and 
the utility distance between the graphs as pc' > 1 — 

(e) 
















Fig. 10. (a), (b) show the utility distances using the Google+ dataset and the Facebook dataset, respectively. Larger perturbation parameter k results in larger 
utility distance. Larger application parameter I decreases the distance, which shows the effectiveness of LinkMirage in preserving global community structures. 


C Upper Bound of Utility Distance 

LinkMirage aims to limit the degradation of link privacy 
over time. Usually, mechanisms that preserve privacy trade¬ 
off application utility. In the following, we will theoretically 
derive an upper bound on the utility distance for our algorithm. 
This corresponds to a lower bound on utility that LinkMirage 
is guaranteed to provide. 

Theorem 7: The utility distance of LinkMirage is upper 
bounded by 21 times the sum of the utility distance of each 
community e and the ratio cut 6t for each Gt, i.e. 

1 ^ 

UD(Go, G't, 1) < Y 2l{e + 5t) (6) 

t=0 

where 5t denotes the number of inter-community links over 
the number of vertices, and each community Ck{t) within Gt 
satisfies \\Gk{t) ~ < e. We defer the proofs to the 

Appendix to improve readability. 

Note that an upper bound on utility distance corresponds 
to a lower bound on utility of our algorithm. While better 
privacy usually requires adding more noise to the original 
sequence to obtain the perturbed sequence, thus we can see 
that LinkMirage is guaranteed to provide a minimum level of 
utility performance. 

In the derivation process, we do not take specific evolu¬ 
tionary pattern such as the overlapped ratio into consideration, 
therefore our theoretical upper bound is rather loose. Next, 
we will show that in practice, LinkMirage achieves smaller 
utility distance (higher utility) than the baseline approach of 
independent static perturbations. 

D. Utility Experiments Analysis 

Fig. [^a)(b) depict the utility distance for the Google-f and 
the Facebook graph sequences, for varying perturbation degree 
k and the application level parameter 1. We can also see that 
as k increases, the distance metric increases. This is natural 
since additional noise increase the distance between probability 
distributions computed from the original and the perturbed 
graph series. As the application parameter I increases, the dis¬ 
tance metric decreases. This illustrates that LinkMirage is more 
suited for security applications that rely on macro structures, 
as opposed to applications that require exact information about 
one or two hop neighborhoods. Furthermore, our experimental 
results in Figure and Table |IIJ also demonstrate the utility 
advantage of our LinkMirage over the approach of Mittal et 
al. in real world applications. 

VII. Related Work 

Privacy with labeled vertices An important thread of research 
aims to preserve link privacy between labeled vertices by 


obfuscating the edges, i.e., by adding /deleting edges 
1291 , 142] . These methods aim to randomize the structure of 
the social graph, while differing in the manner of adding noise. 
Hay et al. |T^ perturb the graph by applying a sequence of 
r edge deletions and r edge insertions. The deleted edges 
are uniformly selected from the existing edges in the original 
graph while the added edges are uniformly selected from the 
non-existing edges. However, neither the edge deletions nor 
edge insertions take any structural properties of the graph into 
consideration. Ying and Wu proposed a new perturbation 
method for preserving spectral properties, without analyzing 
its privacy performance. 

Mittal et al. proposed a perturbation method in |29l , which 
serves as the foundation for our algorithm. Their method 
deletes all edges in the original graph, and replaces each edge 
with a fake edge that is sampled based on the structural prop¬ 
erties of the graph. In particular, random walks are performed 
on the original graph to sample fake edges. As compared to the 
methods of Hay et al. GD and Mittal et al. p9| , LinkMirage 
provides up to 3x privacy improvement for static social graphs 
and up to lOx privacy improvement for dynamic social graphs. 

Another line of research aims to preserve link privacy fT5| 
| [44} by aggregating the vertices and edges into super vertices. 
Therefore, the privacy of links within each super vertex is 
naturally protected. However, such approaches do not permit 
fine grained utilization of graph properties, making it difficult 
to be applied to applications such as social network based 
anonymous communication and Sybil defenses. 

Privacy with unlabeled vertices While the focus of our paper 
is on preserving link privacy in context of labeled vertices, 
an orthogonal line of research aims to provide privacy in 
the context of unlabeled vertices (vertex privacy) 0 (13 
| [36| . Liu et al. p7] proposed /c-anonymity to anonymize 
unlabeled vertices by placing at least k vertices at an equivalent 
level. Differential privacy provides a theoretical framework 
for perturbing aggregate information, and Sala et al. p6| 
leveraged differential privacy to privately publish social graphs 
with unlabeled vertices. We note that LinkMirage can also 
provide a foundation for preserving vertex privacy as stated 
in Section |V-B Shokri et al. p7| addresses the privacy-utility 
trade-off by using game theory, which does not consider the 
temporal scenario. 

We further consider anonymity in temporal graphs with 
unlabeled vertices. The time series data should be seriously 
considered, since the adversaries can combine multiple pub¬ 
lished graph to launch enhanced attacks for inferring more 
information, p0| , explored privacy degradation in 
vertex privacy schemes due to the release of multiple graph 
snapshots. These observations motivate our work, even though 















we focus on labeled vertices. 

De-anonymization In recent years, the security community 
has proposed a number of sophisticated attacks for de¬ 
anonymizing social graphs |[^, p4| , While most 

of these attacks are not applicable to link privacy mecha¬ 
nisms (their focus is on vertex privacy), they illustrate the 
importance of considering adversaries with prior information 
about the social graplj^ We perform a rigorous privacy anal¬ 
ysis of LinkMirage (Section |IV]) by considering a worst-case 
(strongest) adversary that knows the entire social graph except 
one link, and show that even such an adversary is limited in 
its inference capability. 

VIII. Discussion 

Privacy Utility Tradeoffs: LinkMirage mediates privacy¬ 
preserving access to users’ social relationships. In our privacy 
analysis, we consider the worst-case adversary who knows 
the entire social link information except one link, which 
conservatively demonstrates the superiority of our algorithm 
over the state-of-the-art approaches. LinkMirage benefits many 
applications that depend on graph-theoretic properties of the 
social graph (as opposed to the exact set of edges). This also 
includes recommendation systems and E-commerce applica¬ 
tions. 

Broad Applicability: While our theoretical analysis of 
LinkMirage relies on undirected links, the obfuscation al¬ 
gorithm itself can be generally applied to directed social 
networks. Furthermore, our underlying techniques have broad 
applicability to domains beyond social networks, including 
communication networks and web graphs. 

IX. Conclusion 

LinkMirage effectively mediates privacy-preserving access 
to users’ social relationships, since 1) LinkMirage pre¬ 
serves key structural properties in the social topology while 
anonymizing intra-community and inter-community links; 2) 
LinkMirage provides rigorous guarantees for the anti-inference 
privacy, indistinguishability and anti-aggregation privacy, in 
order to defend against sophisticated threat models for both 
static and temporal graph topologies; 3) LinkMirage signifi¬ 
cantly outperforms baseline static techniques in terms of both 
link privacy and utility, which have been verified both theoret¬ 
ically and experimentally using real-world Facebook dataset 
(with 870K links) and the large-scale Google-F dataset (with 
940M links). LinkMirage enables the deployment of real-world 
social relationship based applications such as graph analytic, 
anonymity systems, and Sybil defenses while preserving the 
privacy of users’ social relationships. 
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X. Appendix 

A. Proof of the Upper Bound of Anti-aggregation Privacy 

||P,'= - P/IItv = - ^*'=IItv < ^El^lllPtWPC - 

Pt{v)pp^h + ^eI^IiiAwpC' - PtWPdli_ = 

ll^t — AIItv + \\Pt ^ ~ ^IItv < — AIItv- 

B. Relationships with Differential Privacy When considering 

differential privacy for a time series of graph sequence {Gi}l^Q, 
we have f{D) = Lt = 1)J{D') = 

P{{Gi}l=o\{Gi{Lt)}l^Q,Lt = _0). For a good privacy 

performance, we need P{{G'Jl^Q\{Gi{Lt)}l^Q, Lt = 1) ^ 

P({Gi}l=Q\{Gi(Lt)}j^Q,Lt = 0). Since the probability of 

given {Gi{Lt)}l^Q as Pi{G'Jl^Q\{Gi{Lt)}l^Q) 
P{{G'}U\{G,{Lt)}U,,Lt = l)PiLt = l|{G,(LO}Uo) + 

P{{G'}U^\{G,{Lt)}U^, Lt = 0)P(Lt = 0 |{G,(L 0 H= 0 ’ 

it is easy to see that if the condition for _ a good privacy 
performance holds, we have (^0}i=o) = 

P({GWi=o\{G^iLt)}%^) ~ 

which is the same as in Definition ^and means that the posterior probability 
is similar to the prior probability, i.e., the adversary is bounded in the 
information it can learn from the perturbed graphs. 

C. Proof of Theorem Expectation of Perturbed Degree According to 

Theorem 3 in j^, we have E(deg^o^('f;)) = deg('f;), where deg'^^^{v) 
denotes the degree of v after perturbation within community. Then we 
consider the random perturbation for inter-community subgraphs. Since 
the probability for an edge to be chosen is (d) ^ ^^ ^ 

the expected degree after inter-communitv perturbation satisfies 

deg{va(i)) deg(^;b (f)) (| + I) 

“ 2^j Kl^a | + |^^b I) 

= deg('f;a(^)). Combining with the expectations under static scenario, we 
have E(deg'(u)) = deg(u). 

D. Proof of the Upper Bound for the Utility Distance We first introduce 
some notations and concepts. We consider two perturbation methods in 
the derivation process below. The first method is our dynamic perturbation 
method, which takes the graph evolution into consideration. The second 
method is the intermediate method, where we only implement dynamic 
clustering without selective perturbation. That is to say, we cluster Gt, then 
perturb each community by the static method and each inter-community 
subgraphs by randomly connecting the marginal nodes, independently. We 
denote the perturbed graphs corresponding to the dynamic, the intermediate 


method by G[,G^^ respectively. Similarly, we denote the perturbed TPM 
for the two approaches by P/,PA- To simplify the derivation process, we 
partition the proof into two stages. In the first stage, we derive the UD upper 
bound for the intermediate perturbation method. In the second stage, we 
derive the relationship between and G^. Results from the two stages can 
be combined to find the upper bound for the utility distance of LinkMirage. 
Denoting the communities as Ci,C 2 ,-,Gic^ and the inter-community 
subgraphs as C 12 , G 13 , • • •, we have 


\\pt 

— TV 




Pt(l,l) 

— P' 

Pt(l,Kt) - 

p' 

= 

Pt(2,l) 

— p' 

Pt(2,Kt) - 

P' 

t{2,Kt) 


Pt(Kt,l) 

— P' 

■ Pt(Kt,Kt) - 

P' 

^ 1 
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jVi^fc=i 

\vt{kmPt(k,k) 

Pt(k,k) IItv 


+ —y""* 

\Vt\^k,j= 

\Et{k,j)\\ 

i,k=i3 

\Pt(k,j) - Pi(k,j)\\TV 

< e 

^ + St 





Here, 5t is the ratio cut of the graph QJ, and 5t = |Pt_in|/|Ut| = 
k^j \Pt{^T3)\/\^t\- arbitrary matrix P and Q, we have ||P^ — 
Q^IItv < /||P — QIItv- Combining the above results, we have 

VV>(Gt,G'y,l)<l\\Pt-Py\\Tv <l{e + 5t) (8) 


Then, we generalize the utility analysis of intermediate perturbation to our 
dynamic perturbation. Assume that there are out of Kt clusters that are 
considered as changed, which would be perturbed independently, and Kf 
out of Kt clusters are considered as unchanged, i.e., their perturbation would 
follow the perturbation manner in G'^_^. To simplify derivation, we use Pt{k) 
instead of Pt(^k,k) 1^ represent the TPM of the k-th community. Then, we have 


UD(Gt,G;,l) = ||Pt -P/IItv 

Efii \\Pt(k) - P^k) IItv + E]1‘i \\Pt(j) - P^(j) IItv 

--PSi- 

- lA ( ^ ~ Pi(k)\\TV + E “ ■P(t-l)0)llTV 

+ IF(t-l)0') -•P(f-l)(i)llTV+ IF(t_l)0) -■Pt(i)llTv)) +St 

^ Ef=l \\Pt(k) -Plik)\\TV . 

\Kt\ \Kt\ ^ ‘ 

< UD(Gt, G^^, 1) + e + dt 


where eo denotes the threshold to classify a community as changed or 
unchanged. The last inequality comes from the fact that eo < e. Then, we 
can prove UD(Gt,Gj,0 = ||P,^ - (P/)^||tv < /||Pt -P/||tv < l\\Pt - 
Pf ' Wtv +lie + 6t) = 2l(e + 6t) and UD(Go, • • • Gt, /) < 

E. Proof for Relating Utility Distance with Structural Metrics From 
the definition of total variation distance, we have ||P^(GJ) — 7r||Tt/ + 
\\Pv{Gt) — 7 v\\tv > WPviG't) — Pv{G)\\tv- Taking the maximum over 
all vertices, we have max ||P^(GJ) + 7t\\tv + max ||P^(Gt) — 7t\\tv > 
max\\Pf(G'f.) — Py{G)\\TV- Therefore, for t > TG{e), max\\Pf(G'f.) — 
7 t\\tv > max||PJ(Gj) - P:^{G)\\tv + max||PJ’(Gt) - 7t\\tv > 
sil*j ll-P„^(G;)-p;(G)||Tv--7r||Tv _ g ^ UD(Gt,GJ,TG(e)) - €. Then, 
we have Tq/ (UD (Gt, GJ, (e)) — e) > It is known that the 

second largest eigenvalue modulus is related to the mixing time of the graph 

as Tq/ (e) < e Prom this relationship, we can bound the SLEM 

t Mg' ^ 

in terms of the mixing time as 1 — n+iog (^ ^ Replacing e with 

UD(Gt,GJ,rG,(e)) - e, we have 1 - (uD(Et%7°G j4))-7 ^ 

Finally, we leverage (UD(Gt, Gj, (e) — e)) > (e) in the 

, . l°gn+log G))_, 

above equation, to obtain p,Q/^ >1 -^-. 



















